Chinese Verb Sense Discrimination Using an EM Clustering Model with Rich Linguistic Features
نویسندگان
چکیده
This paper discusses the application of the Expectation-Maximization (EM) clustering algorithm to the task of Chinese verb sense discrimination. The model utilized rich linguistic features that capture predicateargument structure information of the target verbs. A semantic taxonomy for Chinese nouns, which was built semi-automatically based on two electronic Chinese semantic dictionaries, was used to provide semantic features for the model. Purity and normalized mutual information were used to evaluate the clustering performance on 12 Chinese verbs. The experimental results show that the EM clustering model can learn sense or sense group distinctions for most of the verbs successfully. We further enhanced the model with certain fine-grained semantic categories called lexical sets. Our results indicate that these lexical sets improve the models performance for the three most challenging verbs chosen from the first set of experiments.
منابع مشابه
Features of Verb Complements in Co-composition: A case study of Chinese baking verb using Weibo corpus
In the Generative Lexicon Theory (GLT), co-composition is one of the generative devices proposed to explain the cases of verbal polysemous behavior where more than one function application is allowed. The English baking verbs were used as one of the examples to illustrate how their complements co-specify the verb with qualia unification. In this paper, we begin by exploring the polysemy of Chin...
متن کاملAFAST: An Automatic Frames Acquisition System
This paper describes an unsupervised strategy to acquire lexico-semantic frames (LSFs) of verbs from sentential parsed corpora (in syntactic level). The problems of acquiring LSFs consist of verb senses ambiguity, diversity of linguistic usages, and lack of completed frame slots in a single sentence. We propose an specific clustering technique based on the Minimum Description Length (MDL) princ...
متن کاملSupervised Morphology Generation Using Parallel Corpus
Translating from English, a morphologically poor language, into morphologically rich languages such as Persian comes with many challenges. In this paper, we present an approach to rich morphology prediction using a parallel corpus. We focus on the verb conjugation as the most important and problematic phenomenon in the context of morphology in Persian. We define a set of linguistic features usi...
متن کاملVerb Sense and Subcategorization: Using Joint Inference to Improve Performance on Complementary Task
We propose a general model for joint inference in correlated natural language processing tasks when fully annotated training data is not available, and apply this model to the dual tasks of word sense disambiguation and verb subcategorization frame determination. The model uses the EM algorithm to simultaneously complete partially annotated training sets and learn a generative probabilistic mod...
متن کاملAligning Features with Sense Distinction Dimensions
In this paper we present word sense disambiguation (WSD) experiments on ten highly polysemous verbs in Chinese, where significant performance improvements are achieved using rich linguistic features. Our system performs significantly better, and in some cases substantially better, than the baseline on all ten verbs. Our results also demonstrate that features extracted from the output of an auto...
متن کامل